Determining Descriptive Attributes for Listing Locations

ABSTRACT

Listings and reviews of listings can be processed to identify descriptive attributes for locations associated with the listings. To do this, a corpus of words is generated for various locations based on listings in the locations and reviews of those listings. An expected frequency, and per-location frequency for each word is determined. These numbers are in turn used to determine a number of high frequency listing locations, and a number of below expected frequency listing locations for each word. Based on a comparison of the number of high frequency listing locations and the number of below expected frequency listing locations of a word with an attribute reference number, the word can be identified either as an attribute that is likely descriptive of the location, or not.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of prior, co-pending U.S. patentapplication Ser. No. 14/800,369, filed Jul. 15, 2015, which claims thebenefit of U.S. Provisional Application No. 62/025,397, Jul. 16, 2014,both of which are incorporated by reference in their entirety.

BACKGROUND

Travel booking systems often include functionality for providing travellocations to users. When programmatically providing travel locations toa user, it is important to present contextual and interesting contentdescribing the locations such that the user is motivated to travel tothe location. In some travel booking systems, such contextual andinteresting content is identified using methods based on words thatappear with high frequency in travel descriptions and reviews. Whilethis technique surfaces terms that are popular in listings and, thus,may provide some contextual insights into the location, the terms areoften too generic to the location and do not particularly describeattributes of the location that may be of most interest to a user towhom the location is being provided. For example, using the term‘apartment’ and ‘Muni’ would appear from listings located in SanFrancisco since these terms are commonly used in listings and reviews ofSan Francisco locations. While these terms are descriptive of andassociated with San Francisco, these terms do not identify attributes ofSan Francisco that would entice a prospective traveler to visit thecity.

SUMMARY

Listings and reviews of listings can be processed to identifydescriptive attributes for locations associated with the listings, wherethe listings describe goods or services and each listing is associatedwith a geographic location. To do this, a corpus of words is generatedfor based on listings in the locations and the reviews of thoselistings. For each word, an expected frequency, one or more per-locationfrequencies, a number of high frequency listing locations, a number ofbelow expected frequency listing locations, and a descriptiveness metricare determined. For a given word, the number of high frequency listinglocations is a number of locations where the per-location frequency ofthe word is a first multiple greater than the expected frequency forthat word. Similarly, for a given word the number of below expectedfrequency listing locations is a number of locations where theper-location frequency of the word is a second multiple smaller than theexpected frequency. The descriptiveness metric of a word is based on thenumber of high frequency listings locations and the number of lowfrequency listings locations for that word. By comparing thedescriptiveness metrics of the words in the corpus and an attributereference number, some of the words in the corpus can be labeled as“attributes,” meaning they are interpreted to be descriptive oflocations, such that if a user of the system was provided with a listinglabeled with one or more of the attributes, the attributes would bemeaningful to them in understand the characteristics of the location inwhich the listing is located.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention has advantages and features which will be readily apparentfrom the following detailed description of the invention and theappended claims, when taken in conjunction with the accompanyingdrawings, in which:

FIG. 1 is a block diagram of a computing environment, according to oneembodiment.

FIG. 2 is a block diagram of an online booking system browsing andbooking listings having specific locations, according to one embodiment.

FIG. 3 is a flow chart for processing listings and reviews to identifydescriptive attributes for listing locations, according to oneembodiment.

FIGS. 4A and 4B illustrate exemplary interfaces that use the attributesidentified as meaningfully describing a location.

DETAILED DESCRIPTION System Overview

FIG. 1 is a block diagram of a computing environment, according to oneembodiment. FIG. 1 and the other figures use like reference numerals toidentify like elements. A letter after a reference numeral, such as“113A,” indicates that the text refers specifically to the elementhaving that particular reference numeral. A reference numeral in thetext without a following letter, such as “113,” refers to any or all ofthe elements in the figures bearing that reference numeral (e.g. “113”in the text refers to reference numerals “113A” and/or “113B” in thefigures).

The network 105 represents the communication pathways between users 103(e.g., consumers) and the online booking system 111. In one embodiment,the network is the Internet. The network can also utilize dedicated orprivate communication links (e.g. wide area networks (WANs),metropolitan area networks (MANs), or local area networks (LANs)) thatare not necessarily part of the Internet. The network uses standardcommunications technologies and/or protocols.

The client devices 101 are used by the users 103 for interacting withthe online booking system 111. A client device 101 is a device that isor incorporates a computer such as a personal computer (PC), a desktopcomputer, a laptop computer, a notebook, a smartphone, or the like. Acomputer is a device having one or more processors, memory, storage, andnetworking components (either wired or wireless). The client device 101executes an operating system, for example, a MicrosoftWindows-compatible operating system (OS), Apple OS X or iOS, a Linuxdistribution, or Google's Android OS. In some embodiments, the clientdevice 101 may use a web browser 113, such as Microsoft InternetExplorer, Mozilla Firefox, Google Chrome, Apple Safari and/or Opera, asan interface to interact with the online booking system 111. In otherembodiments, the client device 101 may execute a dedicated applicationfor accessing the online booking system 111. When executing either abrowser 113 or a dedicated application to interface with the onlinebooking system 111 the client device 101 is configured to and operatesas a particular, special purpose device.

The online booking system 111 includes web server 109 that presents webpages or other web content that form the basic interface visible to theusers 103. Users 103 use respective client devices 101 to access one ormore web pages, and provide data to the online booking system 111 viathe interface.

The online booking system 111 may be utilized, for example, as anaccommodation reservation system, a dining reservation system, arideshare reservation system, a retail system, and the like. Moregenerally, the online booking system 111 provides users with access toan inventory of consumable resources (e.g. goods and services) that areavailable to consumers, where the resources are typically available onlyfor a limited duration, and the real world, physical location of eachresource is considered as a factor in the consumer's decision to consume(e.g., purchase, license, or otherwise obtain) the resource. Generally,resources available at some locations are more desirable than otherwiseidentical resources available at other locations. Resources includeaccommodations, restaurants, vehicles, attractions (e.g., shows, events,and tourist attractions), shopping centers and the like. For example, inan online booking system 111 that provides accommodations,accommodations in particular neighborhoods may be more or less desirablethan otherwise identical accommodations in other neighborhoods: a givenneighborhood may be considered more interesting, more prestigious,safer, or as having some other quality that consumers deem valuable whenselecting accommodations.

In some embodiments, the online booking system 111 facilitatestransactions between users 103. For example, an accommodationreservation system allows users 103 to book accommodations provided byother users of the accommodation reservation system. A ridesharereservation system allows users 103 to book rides from one location toanother. An online market place system allows users 103 to buy and/orsell goods or services face to face with other users. The online bookingsystem 111 comprises additional components and modules that aredescribed below.

Online Booking System Overview

FIG. 2 is a block diagram of an online booking system browsing andbooking listings having specific locations, according to one embodiment.The online booking system 111 includes a database 201, a listing module203, a search module 205, a booking module 207, a review module 209, alocation discovery module 211, and an attribute identification module211.

Those of skill in the art will appreciate that the online booking system111 will contain other modules appropriate for its functionality (e.g.,social networking, banking, commerce, etc.), but that are not describedherein, since they are not directly material to the invention. Inaddition, conventional elements, such as firewalls, authentication andencryption systems, network management tools, load balancers, and soforth are not shown as they are not material to the invention. Theonline booking system 111 may be implemented using a single computer, ora network of computers, including cloud-based computer implementations.The computers are preferably server class computers including one ormore high-performance computer processors and main memory, and runningan operating system such as LINUX or variants thereof. The operations ofthe system 111 as described herein can be controlled through eitherhardware or through computer programs installed in non-transitorycomputer storage and executed by the processors to perform the functionsdescribed herein. The database 201 is implemented using non-transitorycomputer readable storage devices, and suitable database managementsystems for data access and retrieval. The database 201 is implementedin a database management system, such as a relational database (e.g.,MySQL). The online booking system 111 includes other hardware elementsnecessary for the operations described here, including networkinterfaces and protocols, input devices for data entry, and outputdevices for display, printing, or other presentations of data. As willbecome apparent below, the operations and functions of the onlinebooking system 111 are sufficiently complex as to require theirimplementation on a computer system, and cannot be performed as apractical matter in the human mind. The database 201 maintains tablesappropriate to the type of resources being offered, and the users of thesystem. Thus, in one embodiment for an accommodation reservation system,the database includes a host table (storing records for users who arehosts providing listings), a guest table (storing records of users whoare guesting licensing listings from hosts), a listing table (storingrecords of available properties for license), a booking table (storingrecords of listing that have been licensed by a host to a guest), atransactions table (storing transaction information, such as payments),a messages table (storing records of messages between hosts and guest),a reviews table (storing records of reviews provided by hosts of guests,and of guests by hosts) and other administrative and management tables.

The listing module 203 provides a user interface and processing logicfor users to list goods or services for purchase or license to otherusers, and is one means for doing so. For example, if the online bookingsystem 111 is an accommodation reservation system, then the listingmodule 203 provides a user interface suitable for listingaccommodations, such as houses, apartments, condominiums, rooms,treehouses, castles, tents, couches, and sleeping spaces, whichinformation is stored in the listing table. If the online booking system111 is a dining reservation system, then the listing module 203 providesa user interface for listing available reservations at restaurants,entertainment venues, resorts, etc. If the online booking system is arideshare reservation system, then the listing module 203 provides auser interface for listing available rides.

The listing module 203 is configured to receive a listing from a userdescribing the good or service being offered, a time frame of itsavailability, a price, a location, and other relevant factors, andstores this information in the listing table. For example, for anaccommodation reservation system, a listing in the listing tableincludes fields for a type of accommodation (e.g. house, apartment,room, sleeping space, other), a representation of its size (e.g., squarefootage, or number of rooms), the dates that the good or service isavailable, and a booking rate (e.g., per night, week, month, etc.). Thelisting module 203 allows the user to include additional informationdescribing the good or service including photographs and other media.The location information for a listing provides specific reference to aphysical location or area in the real world, and may include a country,state, city, and neighborhood of the listing, geographical coordinates,mailing addresses, or other suitable location specifying information.The listing module 203 is also capable of converting one type oflocation information (e.g., mailing address) into another type oflocation information (e.g., country, state, city, and neighborhood)using externally available geographical map information. Listingscreated using the listing user interface are processed by the onlinebooking system 111 and stored in the database 201 in the listing table.The listings may further be organized by location, based on physicalproximity to each other, to particular landmarks, to traditional typesof listing boundaries (e.g., city, state, county, country), according toan externally defined organizational structure (e.g. neighborhood), etc.

In some online booking systems 111, some listings are temporary, areavailable for booking one time only, and/or are capable of being deletedby the listing user. The listing module 203 stores these historical,unavailable listings in database 201 in an historical listing table. Theonline booking system 111 uses these historical listings to analyze thebehaviors of users in creating, searching, ranking, and bookinglistings. Historical listings may be encrypted or otherwise protected sothat they are not available to anyone other than the operator of thebooking system 111.

The booking module 207 provides a user interface and processing logicfor users to view and book listings created by other users. The bookingmodule 207 receives payment information from booking users, and securelytransmits the payments to listing users. Any user informationtransmitted as part of the purchase processed is encrypted for userprivacy and protection. Bookings are stored in the booking table. Uponcompletion of a booking, the booking is encrypted and stored ashistorical booking information in database 201 in the booking table orin a separate historical booking table.

The review module 209 provides a user interface and processing logic toreceive reviews of the listings offered by other users, providingevaluations, feedback, and other commentary about a listing, and is onemeans for doing so. Completed reviews be included within and appearalongside listings, so that future users interested in booking thelisting can evaluate the listing with the reviews in mind. Reviews arestored in association with their associated listings in the database 201in the reviews table. Similar to historical listings, reviews forhistorical listings may continue to be stored in database 201 after thelisting is no longer available in either the reviews table or a separatehistorical reviews table.

The search module 205 provides a user interface and processing logic forsearching the database for listings responsive to a search query, and isone means for doing so. The user interface of the search module 205 isconfigured to receive a search query specifying various attributes of adesired good or service, such as type, location, price, and so forth.The search module receives the user specific attributes, and constructsa database query (e.g., a query in SQL or other database querylanguage). The database query matches the attributes of the search queryto listings in the listing table in database 201. The matched listingsare then ranked using the ranking module 211. The search query modulethen provides the ranked set of listings to a client device, so that theuser of the client device can access the listings in a convenientmanner. The user interface of the search module 205 is capable ofdisplaying the ranked set of listings by rank order.

Depending upon the implementation, the user interface for receiving asearch query may be simple, allowing for as little as a single textstring to be entered as the search query, or it may allow for multipledifferent kinds of predetermined and/or dynamic input options to beentered in the search query. The user interface provides forspecification of a location for inclusion in the search query. Thelocation may be auto-populated with the current location of the clientdevice 101A the user is using to perform the search. Alternatively, theuser may manually enter a location in the search query. These mayinclude specification of a country, state (or another regionalequivalent such as a province, region, territory, canton, department,county, district, or prefecture), city, neighborhood, or otherdesignation such as geographical coordinates (e.g., longitude,latitude), a street address, and a zip code.

In some scenarios, a user of the online booking system 111 is looking todiscover new and interesting locations for travel and, thus the searchquery provided by the user may be non-specific in terms of a location,but instead descriptive of attributes that the user is interested in.For example, instead of searching by location name (e.g., a search for“Lake Tahoe”) the user may search for “beach” “sunsets” or “waterskiing,” since these are the attribute of interest to the user. Thesearch module 205 identifies locations that may be of interest based onthe input attributes to the user and displays the locations to the userin the user interface. In some embodiments, the search module 205displays the locations in the user interface before the user providesany search query. To entice the user to explore listings, the display ofeach location includes relevant and interesting attributes about thelocation.

Location Attribute Identification

The attribute identification module 211 processes listings in thelisting table and reviews in the reviews table to programmaticallyidentify attributes descriptive of the listing locations, and is onemeans for doing so. Specifically, the attribute identification module211 generates a collection of words that appear in the listings andreviews stored in the database 201 tables. The collection of uniquewords is referred to herein as the “corpus.” A word is “unique” simplyin the sense that it appears once in the corpus. Each word in the corpushas related metadata, including frequency information that indicates howfrequently the word appears in listings and reviews. The frequencyinformation can be raw counts, or normalized values.

For each word in the corpus, the attribute identification module 211computes a descriptiveness metric that indicates how well the worddescribes one or more attributes of locations that are meaningfullydescriptive and would be of interest to a prospective visitor to alocation. The attribute identification module 211 then determines, foreach location having a listing in the database 201, a set of attributesbased on words in the corpus that have a descriptiveness metric above athreshold. These will be the words that are deemed to be mostdescriptive of the location. For example, for each geographic location,such as Los Angeles, Chicago, San Francisco, Paris, Las Vegas, theattribute identification module 211 outputs a set of words (e.g., 10 or20 words) that are deemed to be most descriptive of that specificlocation.

In operation, to generate the corpus, the attribute identificationmodule 211 retrieves every listing and every review of a listing fromthe database 201 (or a target subset thereof) and extracts the wordsthat appear in the textual descriptions of the listings and the reviews.In one embodiment, this initial list is filtered to remove stop words(e.g., prepositions, articles, etc.) to form the corpus. The list isprocessed to determine each unique word. For each unique word in thecorpus, the attribute identification module 211 computes an expectedfrequency based on the number of times the word appears in the corpusand the total number of words in the corpus. In one embodiment, theattribute identification module 211 computes the expected frequency of aword using the following formula:

${f(x)} = \frac{N_{x}}{N_{t}}$

where f (x) is the expected frequency of word x, N_(x) is the totalnumber of times word x appears in the corpus, and N_(t) is the totalnumber of words in the corpus.

In some embodiments, prior to computing the expected frequency, theattribute identification module 211 processes the corpus to filter outwords that may introduce bias. For example, to avoid bias introduced byproper nouns, the attribute identification module 211 remove words fromthe corpus that appear as proper nouns. For example, “Sunset Boulevard“Hayes Valley,” and other proper nouns, such as place names are removed.In one embodiment, the proper nouns are identified using apart-of-speech-tagging (POST) technique. In another embodiment, theproper nouns are identified by matching words and bigram and trigramphrases against curated lists of proper names (e.g., WordNet, orWikipedia). Also, when computing the expected frequency, the attributeidentification module 211 does not count words that appear in bigramsand trigrams as individual words, but does maintain frequency counts forthe bigrams or trigrams themselves. For example, the bigram “jet skiing”is counted, but it does not affect the count of the underlying unigrams,and therefore the expected frequency, of the individual words “jet” or“skiing.” These bigrams and trigrams may be identified using an externalresource, such as Wikipedia.

Further, when computing the expected frequency, the attributeidentification module 211 identifies words that often co-occur in thelistings and reviews for a given location. For example, “winery” and“vineyards” co-occur across listings in Napa. Thus, for each location,the attribute identification module 211 maintains a co-occurrence matrixthat stores the frequency of co-occurrences of words appearing inlistings and reviews for that location. Where a pair of words have asignificant measure of mutual information, the attribute identificationmodule 211 counts the occurrence of each word in the pair towards thefrequency count of the other word. For example, an occurrence of“vineyard” will count towards the frequency count of “winery” and viceversa. In such a manner, words that identify or describe the sameattribute are grouped together.

To compute the descriptiveness metric of each unique word in the corpus,the attribute identification module 211 first determines a per-locationfrequency of the word. For each location for which listings are offered,the module 211 stores a location list of the unique words from thecorpus that appears in listings and reviews for the location, along witha per location frequency. Thus, a location list of words is stored forexample, for San Francisco, Los Angeles, and so on. The per-locationfrequency of a given word in a location list indicates the number oftimes the word appears in listings and reviews of listings for thatlocation.

Once the location lists completed, they can be processed by iteratingover each word to determine the frequency of that word in each location.Some words will have high frequency in a large number of locations. Forexample “restaurant” or “bars” or “parks” will be found in listings andreview of many different locations. Other words will have highfrequency, but only in a small number of locations. These are words thatare highly descriptive of particular locations. For example, “beach”would be very commonly used to in descriptions of Los Angeles or SanDiego, but would rarely appear in descriptions of Paris or London orSeattle. Still other words will have a low frequency, and appear inrelatively few locations. These would be words that may be particular ofspecific locations, but otherwise not commonly used. Thus, for each wordin the location lists, the attribute identification module 211 thendetermines the number of locations that have a per-location frequencythat is higher than a predetermined multiple of the expected frequencyfor that word. This number of locations is referred to herein as the“number of high frequency listing locations.” For example, the highfrequency listing locations for a given word are those in which the wordappears more the 4 times the expected frequency. More specifically, forexample, if the expected frequency of the word “bridge” is 0.025 (i.e.,the word appears 25 times per thousand words), then a high frequencylisting location would have the word “bridge” appearing with a frequencyof 0.100 (100 times per thousand).

Similarly, the attribute identification module 112 also determines thenumber of locations that have a per-location frequency that is less thana different and smaller multiple of the expected frequency of the word.This number of locations is referred to herein as the “number of belowexpected frequency listing locations.” For example, the below expectedfrequency listing locations for a given word are those in which the wordappears less than 0.4 times the expected frequency. In some embodiments,the multiple values (e.g., 4 and 0.4) are reciprocals, but in otherembodiment they need not be. The attribute identification module 211computes the descriptiveness metric based on a ratio of the number ofhigh frequency listing locations and the number of below expectedfrequency listing locations. In one embodiment, the attributeidentification module 211 computes the descriptiveness metric for a wordusing the following formula:

$D_{x} = \frac{N_{h{(x)}}}{N_{l{(x)}}}$

where D_(x) is the descriptiveness metric of the word, N_(h(x)) is thenumber of high frequency listing locations for the word, and N_(l(x))number of below expected frequency listing locations for the word. Whenthe number of high frequency listing locations and the number of belowexpected frequency listing locations are similar in value, the value ofthe descriptiveness metric approaches 1 and indicates that the wordappears both very frequently or very infrequently in a large number oflocations. This information is valuable as it is a measure of theuniqueness of a word to a location. Words having a descriptivenessmetric approaching 1 (e.g., 0.85<D_(x)<1) (herein referred to as the“attribute reference number”) are deemed as attributes and can be usedto meaningfully describe locations. Thus, for each location, thedescriptive metric for each word in the location list can be computed,and the N (e.g., 20) words with a metric value closest to 1 can beobtained and ranked (e.g., based on actual frequency of usage for thelocation). This creates a list of words for each location that arehighly descriptive of the location.

As an example, assume the word ‘beach’ appears significantly morefrequently in listings and reviews of listings in locations that are inclose proximity to a beach than in listings and reviews of listings inlocations are not in close proximity to a beach. Therefore, the numberof high frequency listing locations is likely to be greater than 1,000(i.e., an example number of locations where there are beaches) and thenumber of below expected frequency listing locations is likely to alsobe greater than a 1000 (e.g., the number of locations where there are nobeaches). Computing a descriptiveness metric for the word ‘beach’ basedon these numbers of high and below expected frequency listing locationsprovides a descriptiveness metric having a value that approaches 1. Theword ‘beach,’ therefore, is determined to be an attribute that ismeaningfully descriptive of a location that is in close proximity to abeach. This determination is likely consistent with a visitor'sinterests when visiting such a location.

In some cases, the number of high frequency listing locations and thenumber of below expected frequency listing locations are below and abovea first and a second threshold, respectively, resulting in adescriptiveness metric that is much less than 1. In these cases, theword is filtered out automatically. For example, assume that the word‘Muni’ appears significantly more in listings and reviews of listingslocated in San Francisco but does not appear in listings and reviews oflistings in other locations. Therefore, the number of high frequencylisting locations is likely 1, i.e., San Francisco—the location of theMuni public transportation system, and the number of below expectedfrequency listing locations is likely very high, since most if not allother locations do not have a public transportation system named ‘Muni’.Here, 1 is below the first threshold for a number of high frequencylisting locations, and the number of below expected frequency listinglocations is likely above the second threshold. As a consequence, theword ‘Muni’ is determined to be an attribute that is not meaningfullydescriptive of San Francisco. This determination is likely consistentwith a visitor's interests when visiting San Francisco—the visitor islikely to be more interested in ‘waterfront’ or ‘crabs’ than ‘Muni.’

In other cases, the number of high frequency listing locations and thenumber of below expected frequency listing locations are above and belowa third and a fourth threshold, respectively, resulting in adescriptiveness metric that is much more than 1. In these cases, theword is also filtered out automatically. For example, the word ‘bar’appears in many listings located in many different locations. This isexpected as bars are commonplace. As a result, the number of highfrequency listing locations is likely very high, and the number of belowexpected frequency listing locations is likely very low, as mostlocations will have listing or reviews that mention that a location hasbars. As a consequence, the word ‘bar’ is determined to be an attributethat is not meaningfully descriptive of any particular location.

Once the descriptiveness metric for each unique word in the corpus iscomputed, the attribute identification module 211 identifies the wordshaving a descriptiveness metric of ‘1’ or close to ‘1’ and deems thosewords as attributes. As discussed above, one or more attributes may beused to meaningfully describe a given location. The attributeidentification module 211 identifies, for each location having a listingstored in database 201, a set of attributes that best describe thelocation. Specifically, for each location, the attribute identificationmodule 211 processes the listings and reviews of listings located in thelocation and identifies the set of attributes that appear mostfrequently in the listings and reviews. For example, if the attributesinclude “skiing,” “surfing,” “beach,” “vistas,” “architecture,” and“nature,” the set of attributes that best San Francisco would likely be“vistas,” “architecture,” and “nature,” while the set of attributes thatbest describe South Lake Tahoe would be “skiing,” “hiking,” “vistas,”and “nature.”

Exemplary Method

FIG. 3 is a flow chart for processing listings and reviews to identifydescriptive attributes for listing locations, according to oneembodiment. In step 301, the attribute identification module 211generates a corpus based on listings and reviews of listings stored inthe database 201. The subsequent steps 303-307 are performed for eachunique word in the corpus. In step 303, the attribute identificationmodule 211 computes the expected frequency for a word. The expectedfrequency is determined based on the total number of times the wordoccurs in the corpus and the total number of words in the corpus. Instep 305, the attribute identification module 211 determines a number ofhigh frequency listing locations and a number of below expectedfrequency listing locations. The high frequency listing locations arethose locations in which listings and reviews include the word at ahigher frequency relative to the expected frequency. The below expectedfrequency listing locations are those locations in which listings andreviews include the word at a lower frequency relative to the expectedfrequency. In step 307, the attribute identification module 211 computesthe descriptiveness metric for the word as a ratio of the number of highfrequency listings locations and the number of below expected frequencylisting locations.

In step 309, the attribute identification module 211 identifies thewords having a descriptiveness metric equal to or within a thresholdrange from an attribute reference number and deems those words asattributes. For example, the attribute reference number can be ‘1’ orwithin a threshold range from ‘1’. As discussed above, one or moreattributes may be used to meaningfully describe a given location. Instep 311, the attribute identification module 211 identifies, for eachlocation having a listing stored in database 201, a set of attributesthat best describe the location. Specifically, for each location, theattribute identification module 211 processes the listings and reviewsof listings located in the location and identifies the set of attributesthat appear most frequently in the listings and reviews.

The attribute identification module 211 can receive a request for theattributes of a location. To do this, module 211 identifies the subsetof words in the corpus associated with the location, and compares thatsubset against the list of attributes, resulting in a list of attributesapplicable to that location. This comparison may be performed, forexample, by simple string comparison. The identified list of attributesare provided in response to the request.

Exemplary Uses of Identified Attributes

The attributes identified as meaningfully describing a particularlocation may be used in a variety of ways. For example, metadata relatedto the location may be augmented with the identified attributes. Asanother example, the attributes may be displayed to users bookinglistings in the locations or generally browsing listings.

FIGS. 4A and 4B illustrate exemplary interfaces that use the attributesidentified as meaningfully describing a location. Specifically, FIG. 4Aillustrates user interface tiles associated with different locations.For example, tile 402 is associated with the city Austin. In some cases,the tile associated with a given location includes an attribute of thelocation. For example, tile 404 associated with Santa Cruz includes theattribute “sun” that is deemed to be descriptive to Santa Cruz. Thisattribute is identified using the process described above in conjunctionwith FIGS. 2 and 3. Similarly, FIG. 4B illustrates a discovery userinterface that includes a single tile 406 associated with South LakeTahoe. The tile 406 includes the attribute “nature” that is deemed to bedescriptive of South Lake Tahoe. Again, this attribute is identifiedusing the process described above in conjunction with FIGS. 2 and 3.

Additional Considerations

The foregoing description of the embodiments of the invention has beenpresented for the purpose of illustration; it is not intended to beexhaustive or to limit the invention to the precise forms disclosed.Persons skilled in the relevant art can appreciate that manymodifications and variations are possible in light of the abovedisclosure.

Some portions of this description describe the embodiments of theinvention in terms of algorithms and symbolic representations ofoperations on information. These algorithmic descriptions andrepresentations are commonly used by those skilled in the dataprocessing arts to convey the substance of their work effectively toothers skilled in the art. These operations, while describedfunctionally, computationally, or logically, are understood to beimplemented by computer programs or equivalent electrical circuits,microcode, or the like. Furthermore, it has also proven convenient attimes, to refer to these arrangements of operations as modules, withoutloss of generality. The described operations and their associatedmodules may be embodied in software, firmware, hardware, or anycombinations thereof.

Any of the steps, operations, or processes described herein may beperformed or implemented with one or more hardware or software modules,alone or in combination with other devices. In one embodiment, asoftware module is implemented with a computer program productcomprising a computer-readable medium containing computer program code,which can be executed by a computer processor for performing any or allof the steps, operations, or processes described.

Embodiments of the invention may also relate to an apparatus forperforming the operations herein. This apparatus may be speciallyconstructed for the required purposes, and/or it may comprise ageneral-purpose computing device selectively activated or reconfiguredby a computer program stored in the computer. Such a computer programmay be stored in a non-transitory, tangible computer readable storagemedium, or any type of media suitable for storing electronicinstructions, which may be coupled to a computer system bus.Furthermore, any computing systems referred to in the specification mayinclude a single processor or may be architectures employing multipleprocessor designs for increased computing capability.

Embodiments of the invention may also relate to a product that isproduced by a computing process described herein. Such a product maycomprise information resulting from a computing process, where theinformation is stored on a non-transitory, tangible computer readablestorage medium and may include any embodiment of a computer programproduct or other data combination described herein.

Finally, the language used in the specification has been principallyselected for readability and instructional purposes, and it may not havebeen selected to delineate or circumscribe the inventive subject matter.It is therefore intended that the scope of the invention be limited notby this detailed description, but rather by any claims that issue on anapplication based hereon. Accordingly, the disclosure of the embodimentsof the invention is intended to be illustrative, but not limiting, ofthe scope of the invention, which is set forth in the following claims.

What is claimed is:
 1. A method comprising: generating a corpus of wordspresent in listings and reviews of the listings, the listings describinggoods or services, each listing associated with a location of aplurality of locations; for each of the words in the corpus: computingan expected frequency for a word to appear in the corpus, determining,for each of the locations, a per-location frequency for the word,determining a number of high frequency listing locations comprisinglocations where the per-location frequency of the word is a firstmultiple greater than the expected frequency, determining a number oflow frequency listing locations comprising locations where theper-location frequency of the word is a second multiple smaller than theexpected frequency, and determining a descriptiveness metric for theword based on the number of high frequency listings locations and thenumber of low frequency listing locations; identifying, as attributes,one or more words in the set of words having a descriptiveness metricwithin a threshold range of an attribute reference number; generatingfor display a graphical user interface (GUI) comprising a search querybar and icons for each of a default subset of locations of the pluralityof locations; receiving, via the search query bar of the GUI, a usersearch query specifying an attribute of interest; determining an updatedsubset of the plurality of locations, the updated subset comprisinglocations of the plurality of locations that have the attribute ofinterest as an identified attribute; and updating the GUI to replace theicons for each of the default subset of locations with icons for each ofthe updated subset of the plurality of locations, the icons including adisplay of, for each respective location of the updated subset, therespective one or more words having the descriptiveness metric withinthe threshold range of the attribute reference number.
 2. The method ofclaim 1, wherein the expected frequency is based on a total number oftimes the word occurs in the corpus and a total number of words in thecorpus.
 3. The method of claim 1, wherein the per-location frequencybased on a total number of times the word occurs in listings associatedwith the location.
 4. The method of claim 1, wherein the descriptivenessmetric is a ratio of the number of high frequency listings locations tothe number of low frequency listing locations.
 5. The method of claim 1,wherein the descriptiveness metric is a numerical value that representshow descriptive a word is of a location relative to the other words inthe corpus.
 6. The method of claim 1, wherein the attribute referencenumber is
 1. 7. The method of claim 1, wherein the words in the corpuscomprise bigrams and trigrams.
 8. The method of claim 1, wherein theexpected frequency is based on a total number of times the word occursin the corpus, a total number of times other words semantically similarto the word occur in the corpus, and a total number of words in thecorpus.
 9. The method of claim 1, further comprising: receiving arequest for attributes of one of the locations; identifying a subset ofthe corpus comprising words present in listings and reviews of thelistings associated with the location; comparing the attributes againstthe subset of words to determine a list of attributes for the location;and providing the list of attributes for the location in response to therequest.
 10. The method of claim 9, wherein comparing the attributesagainst the subset of words to determine the list of attributes for thelocation comprises: identifying which of the attributes are present aswords in the subset of the corpus.
 11. A non-transitory computerreadable storage medium comprising instructions that when executed by atleast one processor cause the processor to: generate a corpus of wordspresent in listings and reviews of the listings, the listings describinggoods or services, each listing associated with a location of aplurality of locations; for each of the words in the corpus: compute anexpected frequency for a word to appear in the corpus, determine, foreach of the locations, a per-location frequency for the word, determinea number of high frequency listing locations comprising locations wherethe per-location frequency of the word is a first multiple greater thanthe expected frequency, determine a number of low frequency listinglocations comprising locations where the per-location frequency of theword is a second multiple smaller than the expected frequency, anddetermine a descriptiveness metric for the word based on the number ofhigh frequency listings locations and the number of low frequencylisting locations; identify, as attributes, one or more words in the setof words having a descriptiveness metric within a threshold range of anattribute reference number; generate for display a graphical userinterface (GUI) comprising a search query bar and icons for each of adefault subset of locations of the plurality of locations; receive, viathe search query bar of the GUI, a user search query specifying anattribute of interest; determine an updated subset of the plurality oflocations, the updated subset comprising locations of the plurality oflocations that have the attribute of interest as an identifiedattribute; and update the GUI to replace the icons for each of thedefault subset of locations with icons for each of the updated subset ofthe plurality of locations, the icons including a display of, for eachrespective location of the updated subset, the respective one or morewords having the descriptiveness metric within the threshold range ofthe attribute reference number
 12. The non-transitory computer readablestorage medium of claim 11, wherein the expected frequency is based on atotal number of times the word occurs in the corpus and a total numberof words in the corpus.
 13. The non-transitory computer readable storagemedium of claim 11, wherein the per-location frequency based on a totalnumber of times the word occurs in listings associated with thelocation.
 14. The non-transitory computer readable storage medium ofclaim 11, wherein the descriptiveness metric is a ratio of the number ofhigh frequency listings locations to the number of low frequency listinglocations.
 15. The non-transitory computer readable storage medium ofclaim 11, wherein the descriptiveness metric is a numerical value thatrepresents how descriptive a word is of a location relative to the otherwords in the corpus.
 16. The non-transitory computer readable storagemedium of claim 11, wherein the attribute reference number is
 1. 17. Thenon-transitory computer readable storage medium of claim 11, wherein thewords in the corpus comprise bigrams and trigrams.
 18. Thenon-transitory computer readable storage medium of claim 11, wherein theexpected frequency is based on a total number of times the word occursin the corpus, a total number of times other words semantically similarto the word occur in the corpus, and a total number of words in thecorpus.
 19. The non-transitory computer readable storage medium of claim11, wherein the instructions further cause the processor to: receive arequest for attributes of one of the locations; identify a subset of thecorpus comprising words present in listings and reviews of the listingsassociated with the location; compare the attributes against the subsetof words to determine a list of attributes for the location; and providethe list of attributes for the location in response to the request. 20.The non-transitory computer readable storage medium of claim 19, whereinthe instructions further cause the at least one processor, whencomparing the attributes against the subset of words to determine thelist of attributes for the location, to: identify which of theattributes are present as words in the subset of the corpus.